NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PROMPT: A Fast and Extensible Memory Profiling Framework

https://doi.org/10.1145/3649827

Xu, Ziyang; Chon, Yebin; Su, Yian; Tan, Zujun; Apostolakis, Sotiris; Campanoni, Simone; August, David I (April 2024, Proceedings of the ACM on Programming Languages)

Memory profiling captures programs’ dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique program trace summary, various memory profiler types have been developed. Yet, designing practical memory profilers often requires extensive compiler expertise, adeptness in program optimization, and significant implementation effort. This often results in a void where aspirations for fast and robust profilers remain unfulfilled. To bridge this gap, this paper presents PROMPT, a framework for streamlined development of fast memory profilers. With PROMPT, developers need only specify profiling events and define the core profiling logic, bypassing the complexities of custom instrumentation and intricate memory profiling components and optimizations. Two state-of-the-art memory profilers were ported with PROMPT where all features preserved. By focusing on the core profiling logic, the code was reduced by more than 65% and the profiling overhead was improved by 5.3× and 7.1× respectively. To further underscore PROMPT’s impact, a tailored memory profiling workflow was constructed for a sophisticated compiler optimization client. In 570 lines of code, this redesigned workflow satisfies the client’s memory profiling needs while achieving more than 90% reduction in profiling overhead and improved robustness compared to the original profilers.
more » « less
Full Text Available
PDIP: Priority Directed Instruction Prefetching

https://doi.org/10.1145/3620665.3640394

Godala, Bhargav Reddy; Ramesh, Sankara Prasad; Pokam, Gilles A; Stark, Jared; Seznec, Andre; Tullsen, Dean; August, David I (April 2024, ACM)

Modern server workloads have large code footprints which are prone to front-end bottlenecks due to instruction cache capacity misses. Even with the aggressive fetch directed instruction prefetching (FDIP), implemented in modern processors, there are still significant front-end stalls due to I-Cache misses. A major portion of misses that occur on a BPU-predicted path are tolerated by FDIP without causing stalls. Prior work on instruction prefetching, however, has not been designed to work with FDIP processors. Their singular goal is reducing I-Cache misses, whereas FDIP processors are designed to tolerate them. Designing an instruction prefetcher that works in conjunction with FDIP requires identifying the fraction of cache misses that impact front-end performance (that are not fully hidden by FDIP), and only targeting them. In this paper, we propose Priority Directed Instruction Prefetching (PDIP), a novel instruction prefetching technique that complements FDIP by issuing prefetches for only targets where FDIP struggles - along the resteer path of front-end stall-causing events. PDIP identifies these targets and associates them with a trigger for future prefetch. At a 43.5KB budget, PDIP achieves up to 5.1% IPC speedup on important workloads such as cassandra and a geomean IPC speedup of 3.2% across 16 benchmarks.
more » « less
Full Text Available
GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction

https://doi.org/10.1109/ISCA59077.2024.00011

Chaturvedi, Ishita; Godala, Bhargav Reddy; Wu, Yucan; Xu, Ziyang; Iliakis, Konstantinos; Eleftherakis, Panagiotis-Eleftherios; Xydis, Sotirios; Soudris, Dimitrios; Sorensen, Tyler; Campanoni, Simone; et al (June 2024, IEEE)

Full Text Available
Revisiting Computation for Research: Practices and Trends

https://doi.org/10.1109/SC41406.2024.00076

Giordani, Jeremiah; Xu, Ziyang; Colby, Ella; Ning, August; Godala, Bhargav Reddy; Chaturvedi, Ishita; Zhu, Shaowei; Chon, Yebin; Chan, Greg; Tan, Zujun; et al (November 2024, IEEE)

Full Text Available
SPLENDID: Supporting Parallel LLVM-IR Enhanced Natural Decompilation for Interactive Development

https://doi.org/10.1145/3582016.3582058

Tan, Zujun; Chon, Yebin; Kruse, Michael; Doerfert, Johannes; Xu, Ziyang; Homerding, Brian; Campanoni, Simone; August, David I. (March 2023, International Conference on Architectural Support for Programming Languages and Operating Systems)

Manually writing parallel programs is difficult and error-prone. Automatic parallelization could address this issue, but profitability can be limited by not having facts known only to the programmer. A parallelizing compiler that collaborates with the programmer can increase the coverage and performance of parallelization while reducing the errors and overhead associated with manual parallelization. Unlike collaboration involving analysis tools that report program properties or make parallelization suggestions to the programmer, decompiler-based collaboration could leverage the strength of existing parallelizing compilers to provide programmers with a natural compiler-parallelized starting point for further parallelization or refinement. Despite this potential, existing decompilers fail to do this because they do not generate portable parallel source code compatible with any compiler of the source language. This paper presents SPLENDID, an LLVM-IR to C/OpenMP decompiler that enables collaborative parallelization by producing standard parallel OpenMP code. Using published manual parallelization of the PolyBench benchmark suite as a reference, SPLENDID's collaborative approach produces programs twice as fast as either Polly-based automatic parallelization or manual parallelization alone. SPLENDID's portable parallel code is also more natural than that from existing decompilers, obtaining a 39x higher average BLEU score.
more » « less
Full Text Available
EMISSARY: Enhanced Miss Awareness Replacement Policy for L2 Instruction Caching

https://doi.org/10.1145/3579371.3589097

Nagendra, Nayana Prasad; Godala, Bhargav Reddy; Chaturvedi, Ishita; Patel, Atmn; Kanev, Svilen; Moseley, Tipp; Stark, Jared; Pokam, Gilles A.; Campanoni, Simone; August, David I. (June 2023, Proceedings of the 50th International Symposium on Computer Architecture (ISCA))

Full Text Available
NOELLE Offers Empowering LLVM Extensions

https://doi.org/10.1109/CGO53902.2022.9741276

Matni, Angelo; Deiana, Enrico Armenio; Su, Yian; Gross, Lukas; Ghosh, Souradip; Apostolakis, Sotiris; Xu, Ziyang; Tan, Zujun; Chaturvedi, Ishita; Homerding, Brian; et al (April 2022, 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO))

Modern and emerging architectures demand increasingly complex compiler analyses and transformations. As the emphasis on compiler infrastructure moves beyond support for peephole optimizations and the extraction of instruction-level parallelism, compilers should support custom tools designed to meet these demands with higher-level analysis-powered abstractions and functionalities of wider program scope. This paper introduces NOELLE, a robust open-source domain-independent compilation layer built upon LLVM providing this support. NOELLE extends abstractions and functionalities provided by LLVM enabling advanced, program-wide code analyses and transformations. This paper shows the power of NOELLE by presenting a diverse set of 11 custom tools built upon it.
more » « less
Full Text Available
SCAF: a speculation-aware collaborative dependence analysis framework

https://doi.org/10.1145/3385412.3386028

Apostolakis, Sotiris; Xu, Ziyang; Tan, Zujun; Chan, Greg; Campanoni, Simone; August, David I. (June 2020, Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI))

Full Text Available
Perspective: A Sensible Approach to Speculative Automatic Parallelization

https://doi.org/10.1145/3373376.3378458

Apostolakis, Sotiris; Xu, Ziyang; Chan, Greg; Campanoni, Simone; August, David I. (March 2020, Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS))

Full Text Available
AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers

https://doi.org/10.1109/MM.2020.2986212

Nagendra, Nayana Prasad; Ayers, Grant; August, David I.; Cho, Hyoun Kyu; Kanev, Svilen; Kozyrakis, Christos; Krishnamurthy, Trivikram; Litz, Heiner; Moseley, Tipp; Ranganathan, Parthasarathy (May 2020, IEEE Micro)

Full Text Available

« Prev Next »

Search for: All records